Multi-Class Classification

Core Concept

Multi-class classification extends binary classification to handle three or more mutually exclusive categories, where each instance belongs to exactly one class. The model must learn decision boundaries that partition the feature space into multiple regions, one for each class. Examples include digit recognition (0–9), document categorisation by topic, species identification, or medical diagnosis across multiple disease types. This represents a more complex setting than binary classification: instead of a single boundary separating two outcomes, the model must distinguish among many alternatives while preserving the constraint that predictions are mutually exclusive.

Key Characteristics

  • Multiple decision boundariesThe feature space is partitioned into C regions for C classes. Depending on the algorithm, this may be achieved by learning one boundary per class (e.g. One-vs-Rest), pairwise boundaries (One-vs-One), or a single joint partitioning (e.g. decision trees, neural networks with softmax).
  • Decomposition strategiesMany binary algorithms are extended to multi-class via One-vs-Rest (OvR), which trains one classifier per class (that class vs all others) and selects the class with highest confidence, or One-vs-One (OvO), which trains a binary classifier for every pair of classes—C(C−1)/2 for C classes—and uses voting for the final prediction. Some algorithms handle multiple classes natively without decomposition.
  • Softmax and cross-entropyNeural networks for multi-class typically use a softmax output layer to convert logits into a probability distribution over all classes that sums to 1. Training usually employs cross-entropy loss. The predicted class is the one with highest probability; the full distribution provides uncertainty information.
  • Evaluation metricsAccuracy remains the proportion of correct predictions overall. Confusion matrices become C×C, showing predicted vs actual class and which classes are commonly confused. Macro-averaging computes a metric per class then averages, treating classes equally; micro-averaging aggregates across all classes, weighting frequent classes more; weighted averaging accounts for class imbalance when desired.
  • Class imbalance and hierarchyImbalance is more complex with multiple classes—some may be well-represented while others are rare. Hierarchical classification can help when classes have natural groupings (e.g. first mammal/bird/reptile, then species). Error costs may differ between class pairs (e.g. benign vs malignant misclassification).

Common Applications

  • Digit and character recognitionAssigning images or signals to one of 10 digits or a set of character classes
  • Document categorisationAssigning documents to one topic or category from a fixed set (e.g. news, sports, science)
  • Species identificationClassifying specimens or images into one of many species or taxa
  • Medical diagnosis (multiple conditions)Determining which of several disease types or conditions is present from patient data
  • Intent classificationMapping user utterances or queries to one of several predefined intents
  • Product categorisationPlacing items into a single category in a taxonomy
  • Gesture or activity recognitionClassifying signals or video into one of several gestures or activities

Multi-Class Classification Algorithms

Multi-class classification algorithms either extend binary methods via decomposition (One-vs-Rest, One-vs-One) or softmax-style outputs, or handle multiple classes natively. Choice depends on decision boundary complexity, interpretability, scalability in the number of classes, and compatibility with class imbalance or hierarchical structure.